Context

With the present document, we’ll explore the data available, gather insights and inspect patterns in order to respond to the following questions:

How can we statistically characterize the behavior of inflation across countries, and has it significantly changed since 2000?

More specifically, we would like to explore the following sub-objectives:

  1. Are inflation distribution flat or fat-tailed?
  2. Is inflation stationary or is there seasonal patterns?
  3. How does crisis influence inflation markers?
  4. Which indicators could be used to predict inflation rates?

Before diving into the exploration, it is necessary to merge both available datasets into a single one, as well as do a simple cleaning.

Show the code
library(readr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Show the code
library(tidyr)
library(DT)
library(sf)
Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
Show the code
library(maps)
library(plotly)
Loading required package: ggplot2

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
Show the code
library(ggplot2)
library(leaflet)
library(cowplot)
library(rstatix)

Attaching package: 'rstatix'
The following object is masked from 'package:stats':

    filter
Show the code
library(ggbeeswarm)
library(RColorBrewer)
library(geepack)
library(RESI)
Registered S3 method overwritten by 'clubSandwich':
  method    from    
  bread.mlm sandwich

Preprocessing

Joining

As mentioned before, the first step is the merging of both datasets. As demonstrated below, there are multiple candidates for merging keys:

  • ‘year’: containing unique year value from 2000 to 2024
  • ‘country’: containing 265 countries around the world name in the english locale.
  • ‘iso2c’ and ‘iso3c’, which are country identifiers.

As the ‘comp’ data is only complementary to ‘main’, a right join on main is sensible. In order to avoid duplicated columns, we’ll join those 4 possible identifiers.

Show the code
print(paste0("Length of the 'year' column in `main`: ", length(main$year)))
[1] "Length of the 'year' column in `main`: 6650"
Show the code
print(paste0("Length of the 'year' column in `comp`: ", length(comp$year)))
[1] "Length of the 'year' column in `comp`: 6650"
Show the code
print(
  paste0("'year' in main ranges from ", min(main$year), "to ", max(main$year)))
[1] "'year' in main ranges from 2000to 2024"
Show the code
print(paste0("'year' in main ranges from ", min(comp$year), "to ", max(comp$year)))
[1] "'year' in main ranges from 2000to 2024"
Show the code
print(paste0("Amount of recorded unique countries in `main`: ", length(unique(main$country)), " for ", length(main$country), " total observations"))
[1] "Amount of recorded unique countries in `main`: 266 for 6650 total observations"
Show the code
print(paste0("Amount of recorded unique countries in `comp`: ", length(unique(comp$country)), " for ", length(comp$country), " total observations"))
[1] "Amount of recorded unique countries in `comp`: 266 for 6650 total observations"
Show the code
print(paste0("Is Switzerland named 'Switzerland' (EN)? ", ("Switzerland" %in% unique(main$country)) & ("Switzerland" %in% unique(comp$country))))
[1] "Is Switzerland named 'Switzerland' (EN)? TRUE"
Show the code
join <- left_join(main, comp, by=c("year", "country", "iso2c", "iso3c"))

Cleaning

At first sight, some variables contains a lot of missing values. One approach would be to remove the variable altogether, however, this would be only sensible if said variable isn’t useful at all. Indeed, we should have a reason before trimming down the dataset. Hence, the sensible way to go about it is, firstly to learn more about predictors and important factors important for further analysis, secondly, deal with the missing value based on the insights gathered. Additionnaly, it is important to make sure that deleting any variables don’t create bias.

We currently have information about 266 countries. Morever, as we planed to investigate differences between countries, we’ll need some way to differentiate them. A good way of doing this is to determine which country is considered as ‘poor’ and ‘rich’. For this purpose, we’ll use the World Bank Classification. Although there is the possibility of doing the classification ourselves, the benefits using a proved and valid method is clear. This could also be useful to see whether country wealth influence the presence of missing values.

Country Wealth Classes

This classification is made based on the each country Gross National Income per capital and the Altas Conversation Factor, which is way more effective way to compare exchange rates between countries. For more information about the method, see World Bank Altas Method.

We used the historical classification dataset and then joined them with the already join data. As the original dataset format (XLSX) was unsuited for conversion to CSV, we operated a small manual transformation procedure. A prior cleaning and transformation steps will unsure that both this and join dataset’s format and compatible.

Cleaning

First, the wide-format dataset needs to be transformed into a long-format one. All columns except for country corresponds to yearly date, will be aggregated under a year variable and its values under a wealth_class categorical variable.

Then each year will be stripped of their ‘X’ character and transformed into integers. Additionnaly, we’ll group “..” and “” wealth categories together since they both signifies missing values and labeled them as missing values.

Finally, we’ll join both this dataset with the main dataset, join. However, the join dataset contains more countries than the historical wealth classification have (48). Indeed, the join dataset contains mislabeled items, such as ‘Low Middle Income’ or ‘World’. In contrast, the historical wealth classification dataset contains no mislabeling. Those mislabeled information aren’t useful in our work since we only look at country, and not any other made categorization. This mean that the wealth classification data offers a way of filter out those mislabeling. It also span a longer time period, going back to 1984, compared to 2000 for the main dataset. Hence, we’ll do an inner join on country, iso3c and year.

Remark: Additionnal meta information about the wealth_class can be found inside data/raw/country_wealth_labels_meta.csv.

Show the code
wealth <- read.csv(paste0(dir, "data/raw/country_wealth_labels.csv"))

knitr::kable(wealth[1:5, 1:5])
X country X1987 X1988 X1989
AFG Afghanistan L L L
ALB Albania .. .. ..
DZA Algeria UM UM LM
ASM American Samoa H H H
AND Andorra .. .. ..
Show the code
## Cleaning pipeline for the wealth classification dataset
long_wealth <- wealth %>% 

    ## Renaming variables 
    rename("iso3c" = X) %>% 
    rename_with(.fn = ~ gsub("X", "", .x)) %>% 

    ## Aggregate all variables symbolising a year date into a single variable called `year`
    ## and each a wealth class for each year under a `wealth_class` class.
    pivot_longer(
        cols = colnames(.)[!(grepl("country|iso3c", colnames(.)))],
        names_to = "year",
        values_to = "wealth_class") %>% 

    ## Make sure that each date under `year` is numeric and not a string.
    mutate(year = as.numeric(year)) %>% 

    ## Take care of different labels used for missing values.
    mutate(
        wealth_class = sub("[.]{2}", "", wealth_class),
        wealth_class = na_if(wealth_class, ""))


knitr::kable(long_wealth[1:5, 1:4])
iso3c country year wealth_class
AFG Afghanistan 1987 L
AFG Afghanistan 1988 L
AFG Afghanistan 1989 L
AFG Afghanistan 1990 L
AFG Afghanistan 1991 L
Show the code
print(paste0("Countries amount difference between `join` and wealth classification dataset: ", length(unique(join$country)) - length(unique(long_wealth$country))))
[1] "Countries amount difference between `join` and wealth classification dataset: 48"
Show the code
join_labeled <- inner_join(join, long_wealth, by=c("country", "iso3c", "year"))

Missing value across countries

The dataset contains a lot of missing values, particularly in certain variables such as debt_gdp_pcttotaling 76% of missing values, as well as net_enrol_primary and net_entrol_secondary, both being education enrollment measure.

In the graph below, you can observe the map representation of those missing values, across all countries and summed up over the year. Our observation measure is a single year, this means that countries in red provide very few information at all. Additionnaly, even though european countries seems to contain less missing values than others countries, they are not exempt of them.

The question as how to deal with the missing values is tricky. Indeed, debt per GDP, which is an indicator of the economic health of a country, and education enrollment, whether be it primary or secondary, might reveal interesting relationships with inflation. Indeed, according a paper from Fukunaga, I., Komatsuzaki, T. and Matsuoka, H. (2020), inflation can alleviate debt, particularly when compared with the GDP as higher inflation lead to a increase of costs and good and thus of the GDP too.

Hence, on the one hand we get rid of what could be important influence factors on inflation, or, in the other hand, we keep them but can only focus on a handful of countries.

Show the code
world <- map_data("world")

na_per_country <- join_labeled %>% 
    group_by(country) %>% 
    summarise(across(everything(), ~ sum(is.na(.)))) %>% 
    rowwise() %>% 
    mutate(total_na = sum(c_across(where(is.numeric))))

polygon_and_na <- right_join(world, na_per_country, by=c("region" = "country"))
Show the code
plain <- theme(
  axis.text = element_blank(),
  axis.line = element_blank(),
  axis.ticks = element_blank(),
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.title = element_blank(),
  panel.background = element_rect(fill = "white"),
  plot.title = element_text(hjust = 0.5)
)

worldplot <- polygon_and_na %>% 
    select(long, lat, group, debt_gdp_pct, net_enrol_primary, net_enrol_secondary) %>% 
    pivot_longer(
        cols = c(debt_gdp_pct, net_enrol_primary, net_enrol_secondary),
        names_to = "var",
        values_to = "missing_value_yearly") %>% 
    ggplot(
        mapping = aes(x = long, y = lat)) + 
    coord_fixed(1.3) +
    geom_polygon(
        mapping = aes(fill = missing_value_yearly, group = group)
    ) + 
    scale_fill_distiller(palette = "RdBu", direction=-1) +
    facet_wrap(~ var)
    ggtitle("Total missing values")
$title
[1] "Total missing values"

attr(,"class")
[1] "labels"
Show the code
ggplotly(worldplot)

Exploration

Overall Inflation Trend

Our main point of focus is the inflation rate. Thus, let’s plot this for all available countries first. Luckily, this variable hardly contain any missing values at all (0.164 - 860). Hence, we can just leave them aside for now.

The plot fit a linear and non-linear model on the data and each plot zoom further. Alhough, fitting linear model on time-series data isn’t suited, as time-series data don’t respect the I.I.D condition of linear/non-linear models, it can still provide quick useful information about trends.

An important remark, is that the ‘zooming’ don’t exclude points, it merely zoom in. Consequently, the models are all fitted on the same data. Excluding points periodically would include a selection bias, as wealthier countries tend to have lower inflation rate and less inflation variability than poorer countries (see Inflation trend per wealth class).

Overall, the inflation rate is low and stable. However, we can observe quite extreme values, most notably early 2000’s and starting from 2015. This variation is well model by the non-linear model since it is more sensible to such patterns, while the linear model stays flat.

Ultimately, we don’t see much of information here.

Show the code
list_plot_zoom <- list()
for (scale in c(400, 200, 100, 50, 10)) {
    plot <- ggplot(
                data = join_labeled,
                mapping = aes(x = year, y = inflation_pct)) +
            geom_point(color = "#2b8cbe", alpha = 0.3) +
            geom_smooth(color = "black", method="lm") +
            geom_smooth(color = "blue", method = "gam") +
            coord_cartesian(ylim = c(0, scale)) +
            labs(x = "Year", y = "Inflation (%)") +
            ggtitle(" 'World-wide' inflation rate (2000-2024)") +
            theme_minimal()
    list_plot_zoom[[length(list_plot_zoom) + 1]] <-  plot
}

cowplot::plot_grid(
    list_plot_zoom[[1]],
    list_plot_zoom[[2]],
    list_plot_zoom[[3]],
    list_plot_zoom[[4]],
    list_plot_zoom[[5]],
    ncol = 3)
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 860 rows containing non-finite outside the scale range
(`stat_smooth()`).
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
Warning: Removed 860 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 860 rows containing missing values or values outside the scale range
(`geom_point()`).
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 860 rows containing non-finite outside the scale range
(`stat_smooth()`).
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
Warning: Removed 860 rows containing non-finite outside the scale range
(`stat_smooth()`).
Removed 860 rows containing missing values or values outside the scale range
(`geom_point()`).
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 860 rows containing non-finite outside the scale range
(`stat_smooth()`).
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
Warning: Removed 860 rows containing non-finite outside the scale range
(`stat_smooth()`).
Removed 860 rows containing missing values or values outside the scale range
(`geom_point()`).
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 860 rows containing non-finite outside the scale range
(`stat_smooth()`).
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
Warning: Removed 860 rows containing non-finite outside the scale range
(`stat_smooth()`).
Removed 860 rows containing missing values or values outside the scale range
(`geom_point()`).
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 860 rows containing non-finite outside the scale range
(`stat_smooth()`).
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
Warning: Removed 860 rows containing non-finite outside the scale range
(`stat_smooth()`).
Removed 860 rows containing missing values or values outside the scale range
(`geom_point()`).

Inflation trend per wealth class

Previously, we incorporated another dataset bringing a wealth classification for each country across the yeas (see Country Wealth Classes). The classification computation is mainly based on the Gross Nation Income, Altas Method, which is accounts for prices increases.

Let’s now see how does the inflation developed across the years for each of those classes. We’ll fit a non-linear model on the data as they shows trends more effectively than a simply linear model. Similarly, the wealth_class variable hardyl contains missing values at all (0.022 - 118). However, we don’t want the plot to contain a ‘NA’ legend and, since both wealth class and inflation missing values represent such a low percentage of the dataset, we’ll drop drop each row containing missing values. Naturally, this will be done after filtering for unneeded columns as to not drop too much rows.

This plot offers more details compared the the previous plot. Here, we can observe that, although inflation rate was relatively low across the past 25 years, some countries still experienced and are still experiencing comparatively high inflation (~20 %) compared to other countries. Additionnaly the richest the country, the less inflation it has.

Concerning trend, poorest countries (L & LM) demonstrate a clear overall pattern across the past 25 years. Their inflation rate diminished up around 2010 before increasing again. Upper-middle countries maintained a slow and slow inflation increase, while wealthier countries show a more distributed pattern. They displayed a cycle pattern of global inflation increase at major crisis (2008 - subprime crisis and 2020 - COVID crisis), before entering a deflation phase.

Interestingly, only wealthy countries seemed affect by the 2008 and 2020 crisis, as if the term “Global” in Global Financial crisis don’t accounts for less developed countries.

Show the code
plot_data <- join_labeled %>% 
    select(year, country, wealth_class, inflation_pct) %>% 
    drop_na()

list_plot_class <- list()
for (scale in c(400, 200, 100, 50, 10)) {
    plot <- ggplot(
        data = plot_data,
        mapping = aes(
            x = year, 
            y = inflation_pct)) +
        geom_point(color = "#2b8cbe", alpha = 0.3) +
        geom_smooth(
            mapping = aes(colour = wealth_class), 
            method = "gam",
            alpha = 0) +
        coord_cartesian(ylim = c(0, scale)) +
        labs(x = "Year", y = "Inflation (%)") +
        ggtitle(" Wealth class inflation rate (2000-2024)") +
        theme_minimal()
    list_plot_class[[length(list_plot_class) + 1]] <-  plot
}
cowplot::plot_grid(
    list_plot_class[[1]],
    list_plot_class[[2]],
    list_plot_class[[3]],
    list_plot_class[[4]],
    list_plot_class[[5]],
    ncol = 3)
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'

On top of the fact that the richer countries, the lower rate of inflation on average, intra-class variation also increase the poorer the country is. This could be partially explained by class imbalance. Indeed, although classes have a very similar numbers of unique countries, there is a difference in the amount of observation between classes.

Show the code
plot_data <- join_labeled %>% 
    select(year, country, wealth_class, inflation_pct) %>% 
    drop_na()

plot <- ggplot(
    data = plot_data,
    mapping = aes(
        x = year, 
        y = inflation_pct)) +
    geom_smooth(
        mapping = aes(colour = wealth_class), 
        method = "gam",
        alpha = 0.2) +
    coord_cartesian(ylim = c(0, 10)) +
    labs(x = "Year", y = "Inflation (%)") +
    ggtitle(" Variance in inflation rate across wealth classes (2000-2024)") +
    theme_minimal()

ggplotly(plot)
`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'

Intra Class Inflation Variation

Show the code
get_unique <- function(class){
    return(
        length(unique(join_labeled[join_labeled$wealth_class == class, ]$country)))
}

get_count <- function(class){
    return(
        length(join_labeled[join_labeled$wealth_class == class, ]$country))
}
print("Unique numbers of countries per class: ")
[1] "Unique numbers of countries per class: "
Show the code
print(sapply(c("H", "UM", "LM", "L"), get_unique))
  H  UM  LM   L 
 88  93 100  66 
Show the code
print("Number of countries per class: ")
[1] "Number of countries per class: "
Show the code
print(sapply(c("H", "UM", "LM", "L"), get_count))
   H   UM   LM    L 
1807 1312 1395 1090 

Wealthier countries displayed an interesting cyclical pattern with few divergence across countries. Those pattern seems marked by increasing and decreasing inflation dictated by major crisis. Interesting would be to see wether those increase or decrease sequences marked different distribution properties.

We observe the two following major crisis, both marked by the increased inflation rate: - 2003 to 2008 - 2017 to 2023

Show the code
join_labeled %>% 
    filter(wealth_class == "H") %>% 
    select(country, year, wealth_class, inflation_pct) %>% 
    drop_na() %>% 
    ggplot(
        mapping = aes(x = year, y = inflation_pct)) +
            geom_point(color = "#fee0d2") +
            geom_smooth(color = "#fc9272") +
            ggtitle("Inflation for wealthy countries") +
            coord_cartesian(xlim = NULL, ylim = c(0, 20)) +
            labs("Year", "Inflation (%)") +
            theme_minimal()
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

During crisis, the inflation distribution has a slightly heavier tail. Computing mean, standard-deviation and median differences between both states didn’t show much differences. However, if we remove the 60% inflation extreme outlier in the is_crisis == FALSE condition, we obtain a much more distinguishable difference. To conclude, it shows both crisis state and non-crisis state seems to have different distribution properties.

Show the code
is_crisis <- function(year) {
    if (year %in% c(seq(2003, 2008), seq(2017, 2023))) {
        state <- TRUE
    } else {
        state <- FALSE
    }
    return(state)
}

join_crisis <- join_labeled %>% 
    mutate(is_crisis = sapply(year, is_crisis))

join_crisis %>% 
    select(country, year, wealth_class, inflation_pct, is_crisis) %>% 
    drop_na() %>% 
    filter(wealth_class == "H") %>% 
    ggplot(
        mapping = aes(is_crisis, inflation_pct, fill = is_crisis)) +
            geom_point(alpha = 0.3) +
            geom_violin(alpha = 0.6) +
            ggtitle("Inflation distribution - crisis dependent") +
            theme_minimal()

Inflation Distribution by Crisis Status
Show the code
join_crisis %>% 
    select(country, year, wealth_class, inflation_pct, is_crisis) %>% 
    drop_na() %>% 
    filter(wealth_class == "H" & inflation_pct < 60) %>% 
    group_by(is_crisis) %>% 
    summarise(
        mean = mean(inflation_pct),
        sd = sd(inflation_pct),
        median = median(inflation_pct)
    ) %>% 
    knitr::kable()
is_crisis mean sd median
FALSE 1.934951 2.049884 1.776871
TRUE 3.075433 3.058584 2.304466

Inflation Distribution by Crisis Status - Summary Table

Statistical Analysis

The previous section described a lot of different pattern and insights. The goal of this present section is to confirm statistically previous insights.

Do richer country have lower inflation rate?

Say that the inflammation rate of each country is a random variable \(X\) of 25 observation recolted over the years such that:

\[ X = \{x_1, x_2, x_3, \dots, x_{25}\} \]

\(X\) is auto-correlated since \(x_i\) depends on previous values. Indeed, if \(x_22\) is high, there is a higher chance that \(x_23\) will high as well. As such, \(X\) isn’t independent.

Show the code
ch_acf <- acf(
    join_labeled %>% filter(country == "Switzerland") %>% select(inflation_pct),
    plot = FALSE)

print("Auto-correlation of Switzerland's inflation a different lag year:")
[1] "Auto-correlation of Switzerland's inflation a different lag year:"
Show the code
print(ch_acf)

Autocorrelations of series 'join_labeled %>% filter(country == "Switzerland") %>% select(inflation_pct)', by lag

     0      1      2      3      4      5      6      7      8      9     10 
 1.000  0.367  0.089  0.091  0.096  0.103 -0.054 -0.294 -0.190 -0.262 -0.236 
    11     12     13 
-0.176 -0.226 -0.167 

Additionnaly, it is hard to assume that the inflation rate of each country is independant on another. A particular example is demonstrated below.

Show the code
ch_fr_cor <- cor(
    join_labeled %>% filter(country == "Switzerland") %>% select(inflation_pct),
    join_labeled %>% filter(country == "France") %>% select(inflation_pct),
)

print(paste0("Switzerland - France correlation: ", round(ch_fr_cor[1, 1], digits = 3)))
[1] "Switzerland - France correlation: 0.832"

More specifically, the mean correlation between the inflation rate of all rich country is 0.44. It is enough for us to be prudent and assume dependance on this side.

Show the code
wide_country <- join_labeled %>% 
    filter(wealth_class == "H") %>% 
    select(inflation_pct, year, country) %>%
    drop_na() %>%
    group_by(year) %>% 
    mutate(mean = mean(inflation_pct)) %>% 
    pivot_wider(id_cols = year, names_from = country, values_from = mean, values_fill = 0) %>% 
    ungroup() %>% 
    select(-year)

cor_matrix <- expand.grid(colnames(wide_country), colnames(wide_country))

correlation = c()
for (i in 1:4761){
    row = cor_matrix[i, ]
    x_country = row[["Var1"]]
    y_country = row[["Var2"]]
    x = join_labeled %>% filter(country == x_country) %>% select(inflation_pct)
    y = join_labeled %>% filter(country == y_country) %>% select(inflation_pct)
    p = cor(x[, 1], y[, 1], use="complete")
    correlation = append(correlation, p)
}

cor_matrix = cbind(cor_matrix, correlation)

mean_cor <- mean(cor_matrix[cor_matrix$correlation < 1, ]$correlation)

print(paste0("Mean pearson correlation between countries: ", round(mean_cor, digits=2)))
[1] "Mean pearson correlation between countries: 0.44"

Nonetheless, we try our best to meet this condition by randomly taking a sample of observation in each group. This way, we reduce the possibility of cross and within-country influence on the inflation rate. This case is a classic case of an unpaired test, testing difference in means. A common test for this scenario would be an analysis of variance (ANOVA) or non-parametric tests.

In our example, as demonstrated by the plots below the data isn’t normally distributed. However, although not normally distributed, the observation of each wealth class seems to come from a similar distribution (stafisfying the assumption of identically distributed observations.). Hence, a sensible choice would be to go for a non-parametric tests. More specifically, a Kruskal-Wallis test since we 4 independent group: “H”, “UM”, “LM” and “L”.

Show the code
ww_density <- ggplot(
    data = plot_data,
    mapping = aes(x = inflation_pct)) +
    geom_density(color = "#addd8e", size = 0.75) +
    labs(x = "Inflation (%)", y = "Density") +
    ggtitle("'World-wide Inflation Distribution") +
    theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Show the code
cls_density <- ggplot(
    data = plot_data,
    mapping = aes(x = inflation_pct, color = wealth_class)) +
    geom_density(size = 0.75) +
    xlim(0, 400) +
    labs(x = "Inflation (%)", y = "Density") +
    ggtitle("Inflation Distribution by Wealth Class") +
    theme_minimal() + 
    facet_wrap(~wealth_class)

print(ww_density)

Show the code
print(cls_density)
Warning: Removed 318 rows containing non-finite outside the scale range
(`stat_density()`).

As per the Kruskal-Wallis Test, there is a significant effect of the wealth class on the inflation rate. More specifically, each wealth class are significantly different from each other. Since the sample size is big, the effect size is a biased statistic, hence why we compute the effect size alongside it. Said effect size is considered large and thus support the previous significant claims.

Conclusion: The richer the country, the lower is the inflation rate. However, it is important to note that such insights need to be taken with a grain of salt, as observation are totally independant of each other.

Show the code
random_sample <- join_labeled %>% 
    select(wealth_class, country, inflation_pct) %>% 
    drop_na() %>% 
    group_by(wealth_class) %>% 
    slice_sample(prop=0.25) %>% 
    ungroup()

kw_model_s1 <- kruskal.test(inflation_pct ~ wealth_class, data = random_sample)
kw_p <- kw_model_s1$p.value

kw_eta_s1 <- random_sample %>% 
    kruskal_effsize(inflation_pct ~ wealth_class)


dunn_post_hoc <- dunn_test(random_sample, inflation_pct ~ wealth_class)

print(paste0("Kruskal-Wallis Test p-value: ", kw_p))
[1] "Kruskal-Wallis Test p-value: 7.99467323514369e-34"
Show the code
print(
    paste0(
        "Kruskal-Wallis Test effect size: ",
        round(kw_eta_s1$effsize, digits = 3),
        " with a ",
        kw_eta_s1$magnitude,
        " magnitude"))
[1] "Kruskal-Wallis Test effect size: 0.142 with a large magnitude"
Show the code
knitr::kable(dunn_post_hoc %>% select(.y., group1, group2, p.adj, p.adj.signif))
.y. group1 group2 p.adj p.adj.signif
inflation_pct H L 0.0000000 ****
inflation_pct H LM 0.0000000 ****
inflation_pct H UM 0.0000000 ****
inflation_pct L LM 0.8250435 ns
inflation_pct L UM 0.0007787 ***
inflation_pct LM UM 0.0007787 ***

Do richer country have a more stable inflation rate compared to poorer countries?

In Inflation trend per wealth class we’ve observed graphically the variation difference in inflation between wealth classes (see @intra-class-inflation-var).

In order to statistically confirm whether this assumption is right or wrong, we’ll measure the standard deviation difference between each class.

Show the code
join_labeled %>% 
    select(wealth_class, inflation_pct, year) %>% 
    drop_na() %>% 
    group_by(wealth_class) %>% 
    summarise(sd = sd(inflation_pct)) %>% 
    knitr::kable()
wealth_class sd
H 3.152333
L 33.049704
LM 22.222786
UM 10.927054

Similar to the Previous Section, we don’t consider the inflation rate of a country to be independent, as proven by the previously demonstrated by the slight auto-correlation. Hence, the same could be said by their standard deviation. Additionnaly, it is also unlikely for the inflation rate between countries to be independent. Consequently, we’ll select randomly a sample from each sample, across all countries and years measured. This will ensure minimum influence of countries between each other and reduce the auto-correlation between observations of the same countries. The latter is further supported by the fact that said auto-correlation becomes negligible after 2-3 years.

Since the data isn’t normally distributed, we’ll use the non-parametric Kruskal-Wallis Test.

As per the Kruskal-Wallis test, the wealth class has a significant effect on the inflation standard deviation, with a large effect size. Each wealth class standard deviation is significantly different from each other.

Conclusion: The richer the country, the stabler is the inflation rate across country. For the richer country, which experience high perturbation during the 2008 and 2020 crisis, this means that, although the inflation rate changed, everything country moved in the same direction. However, it is important to note that such insights need to be taken with a grain of salt, as observation are totally independant of each other.

This could be the proof, or the consequence, or a highly globalized economy where each country is dependent on each other. As per the recent global economical turmoil, this can signify global drawbacks.

Show the code
random_sample <- join_labeled %>% 
    select(wealth_class, country, inflation_pct) %>% 
    drop_na() %>% 
    group_by(wealth_class) %>% 
    slice_sample(prop=0.25) %>% 
    mutate(sd_infl_country = sd(inflation_pct)) %>% 
    ungroup() 

kw_model_s2 <- kruskal.test(sd_infl_country ~ wealth_class, data = random_sample)

kw_eta_s2 <- random_sample %>% 
    kruskal_effsize(sd_infl_country ~ wealth_class)

dunn_post_hoc <- dunn_test(random_sample, sd_infl_country ~ wealth_class)

print(paste0("Kruskal-Wallis Test p-value: ", kw_p))
[1] "Kruskal-Wallis Test p-value: 7.99467323514369e-34"
Show the code
print(
    paste0(
        "Kruskal-Wallis Test effect size: ",
        round(kw_eta_s2$effsize, digits = 3),
        " with a ",
        kw_eta_s2$magnitude,
        " magnitude"))
[1] "Kruskal-Wallis Test effect size: 1 with a large magnitude"
Show the code
knitr::kable(dunn_post_hoc %>% select(.y., group1, group2, p.adj, p.adj.signif))
.y. group1 group2 p.adj p.adj.signif
sd_infl_country H L 0 ****
sd_infl_country H LM 0 ****
sd_infl_country H UM 0 ****
sd_infl_country L LM 0 ****
sd_infl_country L UM 0 ****
sd_infl_country LM UM 0 ****

Can inflation rate be used to predict crisis state?

In Inflation trend per wealth class we’ve observed difference in mean, median and standard deviation between inflation during a crisis period and outside of a crisis period for wealthy countries. Consequently,

  • Are those difference statistically significant?
  • If yes, can they be used to predict the crisis state given a inflation rate?

Similar to both previous sections, and shown by @crisis-inflation-distribution, we tackling non-normal distribution having long tails. Hence, we’ll need to tackle this problem using a non-parametric tests for 2 independant groups. Most non-parametric tests assume independance of observation in some shape or another. Currently, we have inflation recolted during crisis and outside crisis. Those informations have been recolted by the same countries for both group, which makes those two groups dependant on each other. Additionnaly, each as shown in Do richer countries have lower inflation rate?, each country’s inflation is auto-correlated. We can also assume some degree of dependance between the inflation in each country. Hence, not only there is between-group independance, we also have some degree of within-group dependance.

In order to ensure independance of observation between groups, we’ll split all wealthy countries in 2 groups. The first group will be used to measure inflation rate during crisis, while the second one will be used for inflation rate outside of crisis.

In order to ensure independance of observation within groups, each groups observation will be chosen randomly. This will remedy the problem of time by the fact that the auto-correlation become negligible after 2-3 years. Additionnaly, chosing randomly chosen observation by randomly chosen countries, lessen the probability of having observation influencing each other.

As we can see using the result below, inflation rates significantly differ whitin or outside crisis. This confirm values optained in @crisis-inflation-summary and show that there is a possibility of infering wether wealthy countries are in a crisis situation based on their inflation rate. This could prove usefull to help countries and central banks adapt their policy.

Show the code
get_group <- function(id) {
    if (id %in% seq(49)) {
        group <- 1
    } else {
        group <- 2
    }
    return(group)
}

by_crisis_h <- join_crisis %>% 
    filter(wealth_class == "H") %>% 
    select(country, year, inflation_pct, is_crisis) %>% 
    drop_na() %>% 
    group_by(is_crisis)

by_crisis_group_h <- by_crisis_h %>% 
    ## Map each country to a unique ID  
    mutate(country_id = as.integer(factor(country))) %>% 

    ## Randomize country order for group attribution
    slice(sample(1:n())) %>% 

    ## Atribute a group for each country
    mutate(group = sapply(country_id, get_group)) %>% 
    
    ## Sort them back in order
    arrange(country)

## Chose randomly 25% of each group for in-group independence.
by_crisis_group_h_rand <- by_crisis_group_h %>% 
    group_by(group) %>% 
    slice_sample(prop=0.25) %>% 
    ungroup()

## Select both non-overlapping condition for between group independence.
x <- by_crisis_group_h_rand %>% filter(group == 1 & is_crisis == TRUE) %>% select(inflation_pct)
y <- by_crisis_group_h_rand %>% filter(group == 2 & is_crisis == FALSE) %>% select(inflation_pct)

unpaired_wilcox <- wilcox.test(
    x[[1]], y[[1]],
    paired = FALSE
    )

print(paste0("Mann-Whitney U Test p-values: ", unpaired_wilcox$p.value))
[1] "Mann-Whitney U Test p-values: 0.000274611949664646"
Generalized Estimating Equations

We previously indentified a pattern that, for wealthy countries, can be used to infer whether or not an economy is in a crisis period. Namely, the median of inflation rate distribution significantly differ between those two states. Exploiting the binary state of the crisis state (either we’re in a crisis, or we’re not) and their significant median difference, we construct a Generalized Estimating Equations model which works well with non-gaussian and correlated data data.

The result shows that inflation can indeed be used to predict crisis state. A increase of one point of inflation, increase the log-chance of 0.2552. In other word, the log-odds of crisis:non-crisis \(P(Y=1):P(Y=0)\) increase by 0.2552, and, since which \(log(p) = log(\frac{p}{1-p}) = 0.2552 = e^{0.2552} = 1.29 = p\), a 29% increase of being in a crisis state. This relationship is significant, and the effect size of such model is good.

Conclusion: We can conclude that inflation can indeed be a valid indicator of crisis-state for rich countries. Investigation for poorer countries was not-conducted since clear receptiveness to common historical crisis has not been found in the inflation rate.

Show the code
gee_data <- join_crisis %>% 
    select(country, year, wealth_class, inflation_pct, is_crisis) %>% 
    filter(wealth_class == "H" & inflation_pct < 60) %>% 
    mutate(country = factor(country)) %>% 
    drop_na()

geeglm <- geeglm(
        is_crisis ~ inflation_pct,
        family = binomial,
        data = gee_data,
        id = country,
        corstr = "ar1",
    )

summary(geeglm)

Call:
geeglm(formula = is_crisis ~ inflation_pct, family = binomial, 
    data = gee_data, id = country, corstr = "ar1")

 Coefficients:
              Estimate  Std.err   Wald Pr(>|W|)    
(Intercept)   -0.67529  0.06719 101.02   <2e-16 ***
inflation_pct  0.25524  0.02625  94.52   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = ar1 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)    1.045  0.0361
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha   0.5525 0.01974
Number of clusters:   68  Maximum cluster size: 25 
Show the code
print(resi(geeglm))

Analysis of effect sizes based on RESI:
Confidence level =  0.05
Call:  geeglm(formula = is_crisis ~ inflation_pct, family = binomial, 
    data = gee_data, id = country, corstr = "ar1")

Coefficient Table 
              Estimate Std. Error z value Pr(>|z|) L-RESI CS-RESI L 2.5%
(Intercept)     -0.675      0.067 -10.051        0 -1.219  -0.117 -1.509
inflation_pct    0.255      0.026   9.722        0  1.179   0.155  0.981
              L 97.5% CS 2.5% CS 97.5%
(Intercept)    -0.989  -0.138   -0.096
inflation_pct   1.550   0.120    0.189


Analysis of 'Wald statistic' Table
Model: binomial, link: logit
Response: is_crisis
Terms added sequentially (first to last)

              Df   X2 P(>|Chi|) L-RESI CS-RESI L 2.5% L 97.5% CS 2.5% CS 97.5%
inflation_pct  1 94.5         0   1.17       0  0.973    1.54       0    0.039

Overall RESI comparing model to intercept-only model:

  Df    X2 P(>|Chi|) L-RESI L 2.5% L 97.5%
1  1 94.52         0  1.173  0.973   1.545

Notes:
1. The RESI was calculated using a robust covariance estimator.
2. Confidence intervals (CIs) constructed using 1000 non-parametric bootstraps. 
3. The bootstrap was successful in 1000 out of 1000 attempts. 
Show the code
logistic_f <- function(x){
    value <- (1 / (1 + exp(-geeglm$coefficients[1] - geeglm$coefficients[2]*x)))
    return(value)
}
gee_data %>% 
    mutate(y = as.integer(is_crisis)) %>% 
    ggplot(
        data = .,
        mapping = aes(x = inflation_pct, y = y)) +
        geom_point(color = "#a6bddb", alpha = 0.3) +
        geom_smooth(
            mapping = aes(y = logistic_f(inflation_pct)),
            color = "#1c9099",
            alpha = 0
        ) +
        labs(x = "Inflation (%)", y = "Crisis State") +
        ggtitle("GEE fit") +
        theme_minimal()
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Sources

Datasets and indicators: World Bank Group